-
-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux: assign CPU temperatures by package/core or CCD #1352
base: main
Are you sure you want to change the base?
Conversation
f61c2d0
to
d4cb5d1
Compare
Fixed build without libsensors. Fixed assignment on RPi. |
Note that the cpu are only parsed once, this takes <0.1ms on my machine. |
I think this is ready to go, should I squash it together sensibly? |
As you like. Just took a quick peek at the list of commits and some suggest that they mostly rectify things changed in previous commits of the PR: Those are the prime subjects to squash in the PR right now. So there's not really much to squash together right now AFAICS. |
@leahneukirchen Anything left to do from your side? If not please mark the PR ready for review. TIA. |
@@ -603,6 +603,100 @@ static void scanCPUFrequencyFromCPUinfo(LinuxMachine* this) { | |||
} | |||
} | |||
|
|||
#ifdef HAVE_SENSORS_SENSORS_H | |||
static void LinuxMachine_fetchCPUTopologyFromCPUinfo(LinuxMachine* this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does similar to scanCPUFrequencyFromCPUinfo()
scan /proc/cpuinfo.
Parsing the frequency here should not make a performance difference.
Then scanCPUFrequencyFromSysCPUFreq()
can be changed to write its values first to a local array and on success override the frequencies from that array.
Thus /proc/cpuinfo is not scanned multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need to move the ifdef HAVE_SENSORS_SENSORS_H
into that function, which makes it harder to read I think.
@@ -674,6 +772,13 @@ Machine* Machine_new(UsersTable* usersTable, uid_t userId) { | |||
// Initialize CPU count | |||
LinuxMachine_updateCPUcount(this); | |||
|
|||
#ifdef HAVE_SENSORS_SENSORS_H | |||
// Fetch CPU topology | |||
LinuxMachine_fetchCPUTopologyFromCPUinfo(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe LinuxMachine_fetchCPUTopologyFromCPUinfo()
can return whether the topology changed and then LibSensors_countCCDs()
and LinuxMachine_assignCCDs()
get only called on change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what this means, this code runs only once already?
@leahneukirchen Do you want to implement the three review comments from @cgzones ? |
I somehow dislike that the sensors are iterated twice: once in |
Seconded, libsensors is slow |
Tested on my notebook with Alder Lake CPU... It shows temperatures for all cores at least. |
Tested on an AMD EPYC 7F52 (16 core, 32 threads, 16x1 CCX), seems to work. At least plausible looking temperatures are now shown for all cores. |
AFAICS this is waiting for some refactoring work, but that work is yet not fully specified/explained. @cgzones can you please elaborate the open discussion threads?. |
Note that sensors are iterated twice on first start only, later only to read the temperatures. (Patches welcome, I find it hard to rewrite this to use one pass.) |
If this is a one-time event it sounds like a reasonable tradeoff, compared to (probably) more complex code. I have not looked at it myself, though. |
Oh, this crashes with a segmentation fault if |
Sigh. With Kernel 6.9 there seems to be a bug in the logic now, my i7-1355U isn't mapped correctly anymore... |
With kernel 6.9.5 I get:
Note that the tempID are not sequential. |
Thanks a lot for the quick fix! |
coreTempCount++; | ||
} | ||
} | ||
ccdID++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the number from the label here, e.g. Tccd5
? The features might not be sorted or even monotonically increasing by 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a total count of CCD however, too...
On my laptop with a Ryzen 5500U, there's only a single
According to the kernel doc's quote of the AMD manual:
But it appears somewhat representative, at least here, so it might make sense to include it. |
I just checked on a server with AMD EPYC 7502, which supposedly (no idea whether it's correct!) has 2 sockets x 4 CCD x 2 CCX x 4 cores = 64 cores. The sockets are exposed as separate k10temp instances:
|
Just having one sensor works already, no? |
Depends on what you mean by "works". It shows a temperature value for each core, but I have no idea whether it matches. FWICT both sensors and its CCDs are read in whatever order sensors provides, which might or might not be related to the CPU core numbers... |
I tested on a |
This need a rebase on top of 6de810f... |
Closes htop-dev#806. Closes htop-dev#1048. Closes htop-dev#1176. Closes htop-dev#1335. Addresses htop-dev#879 (on Linux).
Sensors can be numbered in a non-contigouos way, but we must still scan all labels to get a full account of the measured temperatures.
965fe58
to
8fdd40c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@leahneukirchen : What's still pending with the comment from @Vogtinator ?
Closes #806.
Closes #1048.
Closes #1176.
Closes #1335.
Addresses #879 (on Linux).